Begin with the g4g example, but match this classes style.
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
from scipy import stats
xs = [5,7,8,7,2,17,2,9,4,11,12,9,6]
ys = [99,86,87,88,111,86,103,87,94,78,77,85,86]
m, b, _, _, _ = stats.linregress(xs, ys)
plt.scatter(xs, ys)
plt.plot(xs, [m * x + b for x in xs])
plt.show()
For error, you computed sums of squared errors. Well done. You probably did something like follows:
errors = [(m * xs[i] + b - ys[i]) ** 2 for i in range(len(xs))]
print(errors)
[np.float64(21.626948352384407), np.float64(23.49288828029534), np.float64(4.39178485240792), np.float64(8.1051031441658), np.float64(129.88283706421814), np.float64(160.4258038281837), np.float64(11.536994532945045), np.float64(0.11859128985571113), np.float64(4.413400213657534), np.float64(34.12657393735711), np.float64(25.91326891120767), np.float64(5.496074733564338), np.float64(43.53669186049348)]
We can take the sum of squared errors by just adding these all up! Python has a handy built-in function to do that, or you can write your own.
print(sum(errors))
473.0669610007362
Okay, so now how did machine learning work. Well, we picked a random m and b.
import random
I was fairly surprised students used random() without either adding or multiplying a value that is between zero and one.
Afterall, we've been working with functions that take values in those ranges, and make them much closer to our data so we know what they look like.
Pick a random point, find the slope and intercept of the line to the next point, and that is the first guess.
pt = int(random.random() * len(xs))
# I had Gemini do this.
m = (ys[pt] - ys[pt - 1]) / (xs[pt] - xs[pt - 1])
b = ys[pt] - m * xs[pt]
pt, m, b
(4, -4.6, 120.2)
Actually, forget machine learning, let's think about this - what is the largest and smallest possible m and b values under this model.
# you don't need to know how to do write this, but we had a lecture on how you could understand it.
ms = [(ys[pt1] - ys[pt2]) / (xs[pt1] - xs[pt2]) for pt2 in range(len(xs)) for pt1 in range(len(xs)) if xs[pt1] != xs[pt2]]
print(max(ms), min(ms))
5.0 -13.0
bs = [ys[i//len(xs)] - ms[i] * xs[i//len(xs)] for i in range(len(ms))]
print(max(bs), min(bs))
233.0 42.0
Well that seems easy enough. Let's simply compute the sum of squared error for all possible pairs of m and b values and plot it. We have the code to take these two values and determine the sum of square errors from last class from Gemini, or we can write it ourselves.
sse = lambda m, b : sum([(m * xs[i] + b - ys[i]) ** 2 for i in range(len(xs))]) # S um of S quare E rror
print(sse(5,223), sse(-13,42))
391505 306964
I want to note something I do here - I'm not making a list! I'm making a list of lists. This list of lists is a LOT like an image. In fact, I'm going to make it into an image now that I've said that...
sses = [[sse(ms[i], bs[j]) for i in range(len(ms))] for j in range(len(bs))]
I wonder what minimum and maximum sse is. Little bit annoying with the two dimensions.
print(max([max(i) for i in sses]), min([min(i) for i in sses]))
437345.0 473.09566326530614
Oh I could look at it as a dataframe or even as an image!
import pandas as pd
import numpy as py
from PIL import Image as im
df = pd.DataFrame(sses)
df.head()
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 5972.500000 | 3181.250000 | 3411.500000 | 3181.250000 | 15138.854167 | 13471.916667 | 5435.250000 | 92803.250000 | 4067.500000 | 4995.331633 | ... | 28491.500000 | 45595.250000 | 5151.687500 | 23753.250000 | 2918.687500 | 26858.583333 | 3181.250000 | 11826.530000 | 12427.500000 | 20861.916667 |
| 1 | 10528.750000 | 1550.000000 | 5492.750000 | 1550.000000 | 6288.854167 | 5240.666667 | 1329.000000 | 68897.000000 | 1198.750000 | 1242.653061 | ... | 15722.750000 | 29114.000000 | 9089.187500 | 12222.000000 | 1906.187500 | 14502.333333 | 1550.000000 | 4255.280000 | 4608.750000 | 10155.666667 |
| 2 | 7307.500000 | 2041.250000 | 3756.500000 | 2041.250000 | 11111.354167 | 9691.916667 | 3305.250000 | 82753.250000 | 2432.500000 | 3006.760204 | ... | 22896.500000 | 38515.250000 | 6239.187500 | 18653.250000 | 2026.187500 | 21428.583333 | 2041.250000 | 8310.530000 | 8812.500000 | 16091.916667 |
| 3 | 10528.750000 | 1550.000000 | 5492.750000 | 1550.000000 | 6288.854167 | 5240.666667 | 1329.000000 | 68897.000000 | 1198.750000 | 1242.653061 | ... | 15722.750000 | 29114.000000 | 9089.187500 | 12222.000000 | 1906.187500 | 14502.333333 | 1550.000000 | 4255.280000 | 4608.750000 | 10155.666667 |
| 4 | 20978.923611 | 4781.423611 | 13055.423611 | 4781.423611 | 1098.402778 | 772.090278 | 1672.923611 | 46140.923611 | 2986.423611 | 1999.076672 | ... | 5960.423611 | 15020.423611 | 18817.486111 | 3903.423611 | 5859.486111 | 5221.256944 | 4781.423611 | 556.703611 | 621.423611 | 2799.590278 |
5 rows × 150 columns
im.fromarray(np.array(sses).astype(np.uint8)).show()
These all look bad because we aren't forcing the error values to be within the typical ranges of colors. Let's do that.
Colors should be between 0 and 255.
Errors range from 529 to 112k.
We divide by 112k and multiple by 255, more or less.
im.fromarray(np.array([[i//(437345//250) for i in j] for j in sses]).astype(np.uint8)).show()
We can't see anything because we shouldn't actually be using linear scaling (most likely). We need something better, like a square root.
437345 ** .5
661.3206483998515
Not bad. We squareroot and divide by 3.
im.fromarray(np.array([[(i**.5)//3 for i in j] for j in sses]).astype(np.uint8)).show()
Now that is getting somewhere. Still hard to see though. I wonder if plotly can solve this problem for us.
# prompt: plotly 3d surface plot of sses
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default='notebook'
# Create the surface plot
fig = go.Figure(data=[go.Surface(z=sses)])
# Customize the plot
fig.update_layout(title='Sum of Squared Errors',
scene=dict(
xaxis_title='m',
yaxis_title='b',
zaxis_title='SSE'
))
# Display the plot
fig.show()
Going to smooth it out a little.
go.Figure(data=[go.Surface(z=[[(i**.5)//3 for i in j] for j in sses])]).show()
Our job as machine learning people is to find where the error is lowest. We find it by picking a point, and finding which direction we move to get somewhere lower.
This is called gradient descent, and is probably the most powerful computation technique known at this point in history. It is much faster than computing every error, as we have done today, especially on meaningfully large data sets, but this is a good way to visualize how it works.